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ABSTRACT 

The Taiwanese- American Occultation Survey (TAOS) monitors fields of up to ^1000 stars at 5 Hz 
simultaneously with four small telescopes to detect occultation events from small (~1 km) Kuiper 
Belt Objects (KBOs). The survey presents a number of challenges, in particular the fact that the 
occultation events we are searching for are extremely rare and are typically manifested as slight flux 
drops for only one or two consecutive time series measurements. We have developed a statistical 
analysis technique to search the multi-telescope data set for simultaneous flux drops which provides 
a robust false positive rejection and calculation of event significance. In this paper, we describe in 
detail this statistical technique and its application to the TAOS data set. 
Subject headings: Solar System, Astronomical Techniques, Data Analysis and Techniques 



1. INTRODUCTION 

The Taiwanese- Amer ican Occultation Su r vey operates 
four s m all telescopes (Bianco et al.l 120101: iWang et al.l 
I2009bt iZhang et al.l 120081: iLehner et al.l I2009T) at 
Lulin Observatory in central Taiwan to search for 
occultations by sma l l (~1 km diameter) KBOs 
(iSchlichting et al.ll2009 l: |Wan 
20091: iBickerton et all 1200 ' 
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Chang et all 120071 IRoaues et all 120061: iCooravl 12003 



2007 



Coorav fc Farmerll2003l: IRoaues et alll2003l ). Such occul 
tation events are extremely rare (estimated rates range 
from 10~ 4 to 10 -2 events per star per year), they are 
very short in duration (<200 ms), and at the 5 Hz ob- 
serving cadence used by TAOS, they result in measured 
flux drops of typically <30% in one or two consecutive 
points. This presents a number of challenges, in particu- 
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lar the identification of false positive events of statistical 
origin and candidate events which are in fact of terres- 
trial origin (e.g. birds, airplanes, and extreme scintil- 
lation events). We reject these false positive events by 
requiring simultaneous detection in multiple telescopes. 

A second challenge is finding a robust method to deter- 
mine the statistical significance of any candidate events. 
The noise distribution of each lightcurve is not known a 
priori, due to non-Poisson and non-Gaussian processes 
on the tails of the flux distributions. The typical stars in 
our fields have magnitudes R ~ 13 and a signal to noise 
ratio (SNR) of ~10. Moreover, TAOS monitors fields for 
durations of up to 1.5 hours, and changes in atmospheric 
transparency and airmass introduce further uncertainties 
into the flux measurements. 

To overcome these difficulties, we have developed non- 
parametric techniques using rank statistics. Rank statis- 
tics facilitate a simultaneous analysis of multi-telescope 
photometric measurements to enable a robust determi- 
nation of event significance and false positive rejection, 
which are independent of the underlying noise distribu- 
tions of the lightcurves being analyzed. Rank statistics 
and their application to the TAOS lightcurves will be 
described in the following sections. In Sj3]we review the 
characteristics of occultation events and describe how 
such events would appear in the data. In $3] we discuss 
the rank product statistical test used to calculate event 
significance and the false positive rate. In jJHwe describe 
the lightcurve filtering techniques and diagnostic tests 
used to ensure that the rank product statistical test is 
valid, and in $5] we describe a new and more robust set 
of diagnostics tests. 

The following definitions apply throughout the remain- 
der of this paper. We define a data run as a consecu- 
tive series of multi-telescope observations of a given star 
field made at a cadence of 5 Hz. Typical data runs last 
1.5 hours, comprising 27,000 time series images on each 
telescope. We define a lightcurve set as a set of multi- 
telescope lightcurves of a single star during a given data 
run. There are typically 300-500 stars in an image, and 
hence 300-500 lightcurve sets in a data run. We adopt 
the standard statistical notation wherein we denote a 
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random variable with an upper case letter, and use the 
corresponding lower case letter for an actual value for 
that variable (e.g. Z is a random variable which could 
take on a value of z). We use the function pQ to de- 
scribe a probability density distribution, and PQ to de- 
scribe an actual probability. Finally, we note that the 
four telescopes are labeled TAOS A, TAOS B, TAOS C, 
and TAOS D. TAOS C came online in August of 2008, 
and to date no data from this telescope has been ana- 
lyzed. All example lightcurves shown in this paper come 
from telescopes A, B and D. 

2. OCCULTATIONS BY KUIPER BELT OBJECTS 

An occultation event occurs when an object passes be- 
tween the telescope and a distant star dBickerton et al.l 
[20091: iNihei et al J 120071: iRoaues et all 12003ft . The Earth 
and the occulting object are in relative motion, inducing 
a variation in the measured stellar flux over time. The 
target population for TAOS is small (~1 km diameter) 
KBOs, whose sizes are on the order of the Fresnel scale, 
which is given by 



F = 



where A is the wavelength of observation and A is the 
observer-KBO distance. For TAOS, the median wave- 
length of observation is A ~ 600 nm, and the typical dis- 
tance to KBOs is 43 AU, resulting in F = 1.4 km. Occul- 
tation events by KBOs with diameters D < 10 km thus 
show significant diffraction effects. This is illustrated in 
the left panel of Fig.Q] which shows a simulated occul- 
tation "shadow" from a 3 km diameter KBO projected 
onto the surface of the Earth. 

The timescale of an occultation event is set by the rel- 
ative velocity between the KBO and observer, the size of 
the occultation shadow, and the impact parameter (min- 
imum distance between the KBO and the line of sight to 
the target star) . The relative velocity between the Earth 
and KBO is given by 



event cross section is set by the Fresnel scale: 
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where 4> is the angle of observation between the occulted 
star and opposition, and «e = 29.8 kms -1 is the veloc- 
ity of the Earth around the Sun. The event width (the 
length of the chord across the occultation shadow where 
it crosses the telescope) is given by 



W = VH 2 - b 2 , 

where b is the impact parameter, and H is the event 
cross section, which we define as the diameter of the first 
Airy ring of the diffraction shadow , and which can be 
approximated by (jNihei et al.ll2007j ) 
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where 0* is the angular size of the occulted star. 

For the very small objects (D < 1 km) targeted by this 
survey and stars with small angular diameters (the vast 
majority of stars covered by this survey), the minimum 
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At 43 AU, H min « 5 km. At opposition (0 = 0), 
v IC \ rs 25 kms , and with 6 = the resulting event 
duration is 200 ms, with the duration getting smaller as 
b is increased. 

This is illustrated in the right panels of Fig.Q] The 
top panel shows a slice through the simulated diffraction 
shadow, assuming the KBO crosses the line of sight to 
the star (6 = 0). Note the event width, given by the 
distance between the two top peaks, is about 5 km. (In 
this case, the event width is dominated by the Fresnel 
scale, so the approximation given in Eq.|3] applies.) The 
bottom panel shows this event as it would be measured 
by the TAOS system at 5 Hz. The solid line shows the 
lightcurve which would be measured if the sampling was 
in phase with the event, that is, the measurement at t = 
is centered on the epoch when the KBO is centered on 
the line of sight to the target star. The dotted line shows 
the same event with the sampling out of phase with the 
event. 

Typical occultation events for small KBOs at oppo- 
sition will thus manifest themselves in the TAOS data 
as a reduction in flux on one or two consecutive pho- 
tometric measurements of a star with an otherwise flat 
lightcurve. However, when observing away from oppo- 
sition, the relative velocity decreases, as indicated by 
Eq.[TJ Furthermore, TAOS is also sensitive to objects 
more d istant than the K uiper Belt. The discovery of 
Sedna (|Brown et alj 120041 ) indicates the possibility of a 
large, heretofore unknown population of ob jects at dis- 
tances of 100 to 1000 AU (see iWang et alj ' l2009bL and 
references therein). Such events will also be of a longer 
duration due to the increased angular size of the Fres- 
nel scale, as indicated by Eq.[5J Fig. [5] shows a simu- 
lated lightcurve with an occultation by a 5 km object at 
500 AU, observed at 70° from opposition. The width of 
the event is about 22 km, and with a relative velocity of 
about 9 kms -1 , the event duration is about 2.5 seconds, 
corresponding to a total of 13 measurements at our ca- 
dence of 5 Hz. (Once again, the approximation for the 
event width given in Eq. [3] applies.) 

The goals of the TAOS statistical analysis described in 
this paper are to find as many such events as possible, 
minimize the false positive rate, and provide a method to 
robustly estimate the statistical significance of any candi- 
date event. The statistical technique should be sensitive 
to both the one and two point events shown in Fig.Q] and 
the longer duration events such as that shown in Fig.[2j 
In the following sections the application of rank statis- 
tics to meet these goals will be described. The discussion 
will begin with a focus on single point events, and the 
extension of the statistical technique to multi-point ob- 
servations will be presented in £13.31 

3. RANK STATISTICS 

The idea of rank statistics is quite simple, and is best 
introduced with a single series. Take a time series of flux 
measurements /i, . . . , /n p from one telescope, where N p 
is the number of points in the time series. Replace each 
flux measurement fj with its rank rj. That is, the lowest 
fj will be assigned rank 1 and the highest assigned rank 
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Fig. 1. — Left panel: diffraction shadow projected onto the surface of the Earth from a 3 km diameter KBO at 43 AU. Right panel top: 
perfectly sampled lightcurve assuming zero impact parameter. Right panel bottom: same lightcurve as sampled by the TAOS system at 
5 Hz. Solid curve has measurements centered on event, dotted line shows lightcurve where sampling is out of phase with event. 
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Fig. 2. — Simulated lightcurve with an occultation by a 5 km 
object at 500 AU, measured at 70° from opposition. Diffraction 
features are smoothed out due to finite angular size of the occulted 
star. Dotted line is the infinitesimally sampled lightcurve, and the 
solid line indic ates the lightcurve as would be measured with 5 Hz 
sampling. See Nihei et ah (2007]) for a discussion of occultation 
events from objects at such distances. 

N p . If we use a total of T telescopes, we replace the 
time series for each telescope with its rank within the 
lightcurve from that telescope, giving a set of T rank 
time series r-y. Thus for each time point tj, we have a 
rank tuple 

(rij,...,r T j)- 

If, for each telescope i, the rank r\y follows a uniform 
distribution on {1,...,N P } at each time point tj, and 
if the lightcurves are independent between the dif- 
ferent telescopes, then each rank tuple combination is 
equally likely at each time point, and we can calculate 
exact probability distribution of these rank tuples. The 
calculation of the probability distribution of the raw data 
is impossible to perform on the original time series mea- 
surements, since the underlying distributions of the flux 
measurements /y in each lightcurve are unknown. That 
is, by working with the ranks, we replace something 
unknown, the distribution of the data, with something 
known, the distribution of the ranks. 

A time series is stationary if the distribution of any 
finite subset of the series is invariant under time shift. 



A stationary time series fj is ergodic in mean if, for any 
function G with an expected value E(G(/)) < oo, we 
have the following convergence with probability of 1 (law 
of large numbers): 

n 

lim -yG(/^E(G(/)), 

n^oo Ji L — 4 
i=l 

It can be shown that if a time series fj of length N is 
ergodic in mean, then the distribution of ranks rj /N will 
converge to the uniform distribution for any sequence 
1 < j < N, and it is well known that ergodicity in mean 
can be assured under very weak assumptions on temporal 
dependence within a lightcurve. This pro of is beyond th e 
scope of this paper, but it is published in lCoehlol (|2010l) . 

Therefore, if the data fj are stationary and ergodic in 
mean, this implies that at each time point tj, rj will be 
uniform on {1, . . . , N p }. In addition, if the lightcurves 
from different telescopes are independent, then the rank 
tuples (rij , . . . , rxj ) will be uniform on 

{l,2,...,A p } T , 

and we can calculate exact probability distributions of 
the rank tuples. However, most of the lightcurve sets 
exhibit slowly varying trends that are highly correlated 
between the different telescopes, so our lightcurves are 
neither uncorrelated nor stationary. We have developed 
a filtering algorithm to remove these trends, and in most 
cases the resulting individual lightcurves can be plausi- 
bly modeled as stationary and ergodic in mean. In most 
cases the correlations between the lightcurves from dif- 
ferent telescopes are also removed by the application of 
the filter. This filtering algorithm will be described in 

si . 

Fig-El introduces the rank-rank diagram, which is a 
scatter-plot of the ranks on two telescopes. Similar plots 
will be used throughout the remainder of the paper to 
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Fig. 3. — Schematic of a rank-rank diagram, with JV P = 9. Axes 
are ranks of photometric intensity for individual data points on 
two different telescopes. A single two-telescope photometric mea- 
surement will correspond to a rank doublet on this plot. These are 
marked with the dark squares and labeled with the time at which 
they where measured. For example, note the highlighted rank pair 
at (5,7), measured at time t%. Note that each rank value occurs 
once and only once in the time series for each telescope. 

illustrate various statistical tests. Note that each rank 
must occur once and only once in each time series. Thus 
there must be exactly one point in each row and in each 
column. The ranks within a single lightcurve are thus 
not independently distributed. However, if the condi- 
tions on ergodicity in mean and no dependence between 
telescopes are met, then the rank pairs will be uniformly 
distributed throughout the diagram. 

3.1. The Rank Product Test Statistic 

As can be seen in Fig.[TJ events consistent with occul- 
tations by KBOs will appear as one or two consecutive 
flux drops in all four telescopes. Our test is thus designed 
to find those rank tuples where all of the ranks are small, 
corresponding to a region toward the lower left corner of 
the rank-rank diagram shown in Fig. [3] (expanded to T 
dimensions, where T is the number of telescopes). The 
assumptions on the rank statistics and conditions placed 
on the raw data outlined above allow us to calculate the 
significance level a of various test statistics correspond- 
ing to this region. 

The statistical analysis is designed to use each rank tu- 
ple (rij , . . • , TTj ) to perform a hypothesis test that there 
is an event at time tj . Each measurement can be used 
as a test statistic for the null hypothesis of no occultation 
event versus the alternative that there is an occultation, 
yielding a p- value given by 

The goal is to use the tuple of T p-values at time tj 
to calculate a single test of significance. Fisher proposed 
that the product of the p -values be used as a test statistic 
for this general problem (Fisher 1958; Mosteller & Fisher 
1948). Given the product of ranks at time tj over all 



telescopes T 

T 

i=l 

we define our rank product statistic as 

*(^)- < 4 > 

Event detectio n based on the rank pro duct statistic 
was d escribed in iLehner et al.l (|2006l ) and Zhan g et al. 
(2008). In the description presented in ILehner et al. 
(2006), we made the assumption that the distribution 
of p-values rij/N v was uniform on the continuous in- 
terval [0, 1] when in fact it is uniform on the discrete 
set {1/iVp, 2/iVp, 1}. We found that this assumption 
leads to substantial deviations from the tails of the true 
distribution. For compl eteness, we will desc ribe the sta- 
tistical test presented in lLehner et al.l (|2006l ). followed by 
a description of the correct statistical distribution. 

To calculate the probability distribution of Zj under the 
continuous interval assumption, we define the parameter 




If the p-values r,-j /N p are uniform and continuous on the 
interval [0, 1], this quantity has the probability distribu- 
tion 

p(sij) = e~ s «, 

The probability distribution of the sum of the parameters 
Sij from two telescopes is given by the convolution of the 
two distributions, i.e. 

z 3 =s i? + s 2j 






= z,ie~ Zi . 



For a total of T telescopes, this can be generalized to the 
form 

T 

Zj = Sij , 
i=l 

P(z j ) = f ^r ) zj- 1 e-*i, (5) 

which is simply the T distribution. 

Given that the distribution of p-values is discrete on 
the interval {1/N P , 2/N p , 1}, the true distribution of 
the rank product can be calculated using the function 
K(n;T, N p ), which we define as the number of ways to 
get a product of n by multiplying T integers (number 
of telescopes) between 1 and iV p (number of points in 
the lightcurvcs) . This function can be calculated numer- 
ically, and we have developed a simple algorithm to cal- 
culate K(n;T, N p ) when n < N p (see Appendix). Some 
values of this function for T = 4 telescopes are shown in 
Table [TJ Note that this function is independent of N p if 
n < N p . 

Rather than using the function K to calculate the 
probability density of the rank product statistic z, it is 
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Fig. 4. — Left panel: event significance calculated by the T distribution approximation (dashed line) and the actual distribution (solid 
line), assuming N p = 27, 000 and T = 4. Right panels: the results of two simulations illustrating the power of the rank product method for 
event selection. On the top panel, the histogram shows the parameter z for T = 3 and 7V P = 27, 000, and the dashed line shows the true 
distribution given the null hypothesis. The rank triplet on the tail has ranks {10, 10, 10}. The bottom plot is the same, but with T = 4, 
and the outlier arises from a rank quadruplet of {10, 10, 10, 10}. 



TABLE 1 

Rank quadruplets used to calculate K(z;T, N p ) for T = 4 and 

z<N p . 



K(l) = 1 K{2) = 4 K(3) = 4 K{4) = 10 K(5) = 4 K(6) = 16 
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simpler to calculate the distribution as a function of the 
rank product y as 

p(y) = ^ T K(y;T,N p ). 
p 

We thus calculate the significance, or p- value, of any can- 
didate event as 

P(Y<y) = ^^>(i;T,iV p ). (6) 
p i=i 

However, for clarity we continue to display results in 
terms of the rank product statistic z because candidate 
events are more easily distinguished on the tail of the dis- 
tribution. Given the relation between z and y, it clearly 
follows that 

¥(Z > z(y)) = P(Y <y). 



Not e that the re s ults p ublished in iZhang et al.1 (|2008l ) 
and IBianco et al.l (|2010l) use the correct probability dis- 
tribution based on the discrete rank distribution. 

A comparison between the discrete distribution and 
continuous approximation is shown in Fig-H] left panel. 
The difference is very significant on the tail. This is 
due to the fact that the most significant rank quadruplet 
possible is (1,1,1,1), which corresponds to z = 40.8, 
while the tail on the T distribution at z > 40.8 is due 
to values of < r < 1 /N p , which are allowed under the 
assumption of a continuous distribution of ranks. 

The power of the rank product method is shown in 
Fig. d] (right panel). A three telescope (top) and a four- 
telescope (bottom) data run were simulated. On the 
three telescope run, an event was added with a rank 
triplet {10,10,10}, and on the four-telescope run, an 
event was added with ranks {10,10,10,10}. The four- 
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telescope event has a p-value of 3.7 x 10~ 12 under the 
null hypothesis, while the three-telescope event has a p- 
value of 1.5 x 10 -9 . This simple example illustrates the 
value of using multiple telescopes, in that the absence 
of the fourth telescope decreases the significance of the 
event by more than 2,000, while keeping the false positive 
rate fixed. 

The rank product test statistic is based on subsets of 
the rank tuples where events would plausibly be expected 
to be found. However, in general, the subset of rank 
tuples that provides the most sensitive detection is com- 
posed of those tuples which are most likely in the event of 
an occultation. We could imagine identifying this sub- 
set by running an enormous simulation of occultations 
which produced a probability for each of the T Np tuples. 
The rejection region for the test would then be composed 
of the quadruplets with largest probabilities, the num- 
ber being determined by the desired false positive rate. 
Note that the rejection region might not be symmetric in 
the telescopes (invariant to the telescope labels), which 
might be desirable if lightcurves from some telescopes 
had much better signal to noise ratios than from others. 
We have not carried out such a simulation, but a modest 
simulation indicates that the rejection region determined 
by the rank product statistic is sufficient for the purpose 
of event detection. 

3.2. False Positive Rate 

The methodology we employ is to search for an event at 
every time point in every lightcurve set that the survey 
has collected. Hence, the total number of hypotheses 
tested is 

l 

where the sum is over all lightcurve sets I in the data set. 

If we set a significance threshold of a to declare an 
event at time point tj in lightcurve set I, and we use 
the same significance threshold at all times in all light 
curves, then the expected number of declared events due 
to chance would be 

a x iVhyp = ax^ N p (l). 

i 

Therefore, to control false positives we must make a ver y 
small. For the results published in Zha ng et al.l (|2008| ). 
the data set (after diagnostic cuts, see EH.3I for details) 
comprised a total of 2.3 x 10 9 tuples, and the threshold 
used was a = 10~ 10 , which gives a predicted 0.23 false 
positive events. To keep the false positiv e rate low for 
the lar ger data set (9.0 x 10 9 tuples) used in lBianco et al.l 
(2010), we used a — 3 x 10 , corresponding to an ex- 
pected number of 0.27 false positives. 

In all likelihood there will be at most one occultation 
in a lightcurve set, and if an occultation occurred over 
consecutive time points it would only be counted once. 
Hence, one could consider performing a hypothesis test 
over the entire light curve set rather than at each time 
point by looking at the minimum of the rank product 
over all time points in the light curve set: 



If we test based on j3 at level a 1 , then the expected num- 
ber of false positives is 

^ a' — a' x #{lightcurve sets}. 
i 

The distribution of j3 can be evaluated numerically, but 
it is much easier to work with the rank product at every 
time point. Given a constraint on the false positive rate, 
and given the lengths of the series of interest, and the 
part of the distrib ution we are interested in (the tail), 
it has been found (jCoehloll2010l ) that that there is little 
difference if we work with f3 or with the rank product at 
all time points; the same events will be detected. 

3.3. Detection of Multi-Point Occultation Events 

If we had an occultation from a large object in the 
Kuiper Belt, it would cause a substantial flux drop for 
several consecutive time points, resulting in several val- 
ues of the rank product that pass the significance thresh- 
old. On the other hand, if the object were at 200 AU, it 
might cause a modest flux drop for several time points, 
none of them big enough to pass the threshold. In the 
latter case, it is useful to consider functions of the data 
that look at neighboring time points for detection. 

Let our original time series be /i , ... , /at , and suppose 
we form a new series by 

a 3 = Hfj-k, ■■; fj, ■-, fj+k)- 

For example, b could be a moving average: 

b(fj-k,—ifji—,fj+k) = 2k + i (f j - k + "' + /?+ fe ) 

The series aj might show a larger response at the cen- 
ter of the modest signal than fj, leading to better de- 
tection efficiency. Another possibility for b is to take the 
inner product with some signal. For example, a series of 
event templates could be used as the function b to search 
for occultation events from objects of specific sizes and 
distan c es, which is what w as do ne by iSchlichting et al.l 
(2009). rWan~g et al.l (|2009aft . and iBickerton et al.l (j200lf 

Such manipulation of the data will introduce signif- 
icant autocorrelation into the lightcurves. However, if 
the series fj is stationary and ergodic in mean, and if 
k is small relative to the length of the lightcurves, then 
it follows that a,j will also be stationary and ergodic in 
mean, so the rank product distribution will still be sat- 
isfied. This is because the autocorrelation structure is 
expected to b e the same throughout the lightcurve. See 
iCoehlol ([2010) for a detailed discussion. 

4. LIGHTCURVE FILTERING 

As discussed above, the tests based on rank statistics 
are valid only if the lightcurves from each telescope are 
stationary, ergodic in mean, and independent of those 
from other telescopes. However, in the actual data, sig- 
nificant correlations and non-stationarity are evident, 
as can be seen in the top left panel Fig. [5] Trends 
like those evident in the lightcurves in Fig.[S] can arise 
due to changing airmass and atmospheric transparency 
throughout the duration of a run. The bottom left panel 
of Fig. [5] shows the rank-rank diagram corresponding to 
this lightcurve set (telescopes A and B are shown). Un- 
der the assumption of independence, the points should 
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Fig. 5. — Top left: an unfiltered three-telescope lightcurve set for a single star. Note the correlated variations in the lightcurves. Bottom 
left: rank-rank diagram for telescopes A a nd B. Top right: sa me lightcurve set after filtering. Bottom right: rank-rank diagram after 
filtering. All four plots are reproduced from Zhane ct al. (2008). 



be distributed uniformly across the diagram; clearly this 
is not the case. 

To solve this problem, we apply a mean filter to the 
lightcurves in order to remove the slowly varying trends. 
Each photometric measurement fj in a lightcurve is re- 
placed with 

9 j = fj ~ fj, 

where .f, is defined a s a 3a-clipped (jBertin fc Arnoutsl 
[1991 IDa Costal[T99^ mean taken over a window of size 
W/t which is centered on the point fj. 

After application of the mean filter, we found many 
lightcurves that exhibit fluctuations in variance over 
time. In periods of higher variance, more extreme high or 
low rank values are more likely, and our assumption on 
the uniform distribution of ranks throughout a lightcurve 
is invalid. We thus correct for changes in the variance by 
applying a variance filter, where we replace every point 
gj with 



where the standard deviation <jj is calculated over a win- 
dow of size W a centered on the point gj , and 3<r-clipping 
is applied here as well. 
We want to choose the window sizes to be small enough 



to accurately correct for high frequency trends, but we 
also want them large enough to enable accurate determi- 
nation of / and a. After testing various window sizes, we 
found that — 33 and W a = 151 work best (the vari- 
ance fluctuates much more slowly than the mean, hence 
the larger window size) . 

We note that much work has been done in the past 
on removing such trends from lightcurves, most of which 
involves removing correlated trends in lightcurves from 
diff erent stars in the same series of images (for example, 
see | Kovacs et all [2005; Ta muz et al.l 120051 : iBianco et al.l 
2009). The simpler approach we have adopted works well 
enough for our purposes, but we are considering adopting 
similar techniques for future analysis. 

The top right panel of Fig.[5]shows the same lightcurve 
set shown in the left panel, after filtering. The trends in 
the mean and variance have clearly been removed. The 
rank-rank diagram of the filtered lightcurve set is shown 
in the bottom right panel of Fig.[Sj No dependence is 
evident in this diagram. 

Fig. [6] shows auto-correlation functions (ACFs) after 
application of the mean and variance filters described 
above. Three of the panels show ACFs of lightcurves in 
the TAOS data after filtering (such as those shown in 
the top right panel of Fig. [5]), and one of them shows the 
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Fig. 6. — Auto-correlation plot from four lightcurves. Panels 
(a), (b), and (c): auto-correlations plots from three filtered TAOS 
lightcurves. Panel (d): auto-correlation plot from a synthetic 
white-noise lightcurve, after application of the mean and variance 
filters. Dashed lines are the 95% c.l. limits for what would be 
expected for randomly distributed lightcurves. 



ACF of a synthetic lightcurve of white noise, after the 
same filters have been applied. The auto-correlation is 
insignificant in all cases after a time lag of 6.6 seconds, 
which corresponds to the window size W M of the mean 
filter. The ACF of the simulated lightcurve demonstrates 
that the observed features in the ACFs are consequences 
of the mean filter. The small feature in Fig. [(J, evident at 
time lag of 30 seconds is likely due to the variance filter 
which has a window size of W a = 151 points. The short 
timescale (relative to the length of the lightcurve) of the 
significant auto-correlation features is consistent with our 
modeling of the filtered lightcurves as stationary and 
ergodic in mean, as dependence over longer timescales 
invalidates our assumptions that all possible ranks are 
equally likely at each time point (|Coehloll2010fl . 

While the filter appears to work well on the example 
lightcurve set discussed above, we still need to quantify 
how well it actually works. This is important because 
some data runs may exhibit variations that are not ade- 
quately corrected for by the filters we apply. In partic- 
ular, data runs with extremely rapid fluctuations in the 
mean (due to fast moving cirrus clouds or other phenom- 
ena) will not be removed if the event width is small when 
compared with W^. This is illustrated in Fig. [7j which 
shows a lightcurve set taken during a night with periods 
of fast-moving cirrus clouds. Significant correlations are 
evident in the filtered lightcurves. The corresponding 
rank-rank diagram of telescopes TAOS A and TAOS B 
is shown in Fig.|8l Significant over-dense regions are ev- 
ident in the lower left and upper right corners of this 
diagram. In order for the statistical analysis described 
above to be valid, such data need to be flagged and cut 
from the data set before the application of the rank prod- 
uct test. 

We have thus developed two diagnostic tests to be ap- 
plied to each lightcurve set to assess the quality of the 
data after the application of the filters. We have found 
that phenomena inducing correlations in the lightcurve 
sets tend to affect the entire data run. Therefore, these 
diagnostic tests (described in the following subsections) 
are applied to entire data runs rather than individual 
lightcurve sets. Data runs failing these tests are not con- 
sidered for further analysis. The te sts, described in th e 
following su b section s, were used inlZhang et all ([2008), 
I Wang et all (|2009bt ) and iBianco et all (|2010Q . An im- 
proved version of these tests has been developed for use 
in future analysis runs, and these new tests will be de- 



scribed in 

4.1. Pearson's \ 2 Statistic 

A simple test to determine if the lightcurves in a 
lightcurve set are dependent is to divide the multi- 
telescope rank space into a grid and count the number of 
rank-tuples in each grid element. This is illustrated in the 
left panel of Fig.Hl in the case of two telescopes where 
the rank-rank diagram is divided into a N g x A g grid, 
where N g = 3. With N p = 9 and 9 grid elements, the 
expected number of rank pairs in each grid element is 1 . 

One can then perform a Pearson's \ 2 test on the num- 
ber of rank pairs in each grid element by calculating 



N 1 



* 2 = E 



{Oi - E, 



E; 



where Oi is the observed number of rank-tuples in grid 
element i, and 

AJ 

is the expected number of rank-tuples in grid element i. 
(Note that E, may vary slightly among grid elements if 
N p is not an exact multiple of N g .) 

For a given data run, we expect the distribution of the 
Pearson's x 2 statistic to follow the \ 2 distribution, given 
by 

v ( u v \= I W2)-l -« c /2 (7 s 

where u c = \ 2 an d v is the number degrees of freedom. 
The derivation of v can be illustrated by Fig.[9j It is 
important to note that every rank must appear once and 
only once in the time series for each telescope, and this 
constrains the value of v. The degrees of freedom is the 
number of cells in the grid minus the number of inde- 
pendent constraints. First, the cell counts must sum to 
Ap, giving one constraint. Secondly, the counts in each 
grid row and each grid column must sum to three, giv- 
ing two constraints on the three rows and on the three 
columns which are independent of each other and of the 
first constraint. Thus the total degrees of freedom are 
9 — 2 — 2—1=4. To illustrate, note that in the left grid 
column, the ranks 1, 2, and 3 must appear in telescope 
#1. Therefore, for telescope #2, since there are two dou- 
blets in the bottom left grid element and one doublet in 
the middle left grid element, there must be zero doublets 
in the top left element. The gray grid elements in the 
rank-rank diagram are thus not free parameters. For an 
arbitrary number of telescope T, the number of degrees 
of freedom can be shown to be equal to 

v = N T g - T(N g - 1) - 1. 

4.2. The Hyper geometric Test 

While the Pearson's \ 2 test validates that the rank 
tuples are spread uniformly over {1 . . . A P } T , it is also 
useful to demonstrate that there is no bias towards rank 
quadruplets with all ranks relatively low, since these are 
the target events in the survey. Given a rank limit R, we 
define the variable ith as the number of rank quadruplets 
with nj < R for all telescopes i. This is illustrated in the 
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Fig. 7. — Left panel: An unfiltcred lightcurve set on a night with periods of cirrus cloud cover (during the periods of significant flux 
drops). Right panel: the same lightcurve set after filtering, zoomed in to a period of cloud cover. Significant correlations are evident. Note 
that the correlation is stronger between telescopes TAOS A and TAOS B, which are close together (6 m separation). TAOS D, which is 
about 100 m away, has similar features but they are offset in time. 
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Fig. 8. — Rank-rank diagram (telescopes TAOS A and TAOS B) 
of the lightcurve set shown in Fig. [7] The rank pairs are not uni- 
formly distributed, as there are denser than average regions in the 
lower left and upper right corners. 



case of two telescopes in the right panel of Fig.Hl where 
we choose R = A. In this figure, = 3 is the number of 
rank doublets in the shaded lower left corner of the rank- 
rank diagram. Note that with R = 4 there are exactly 
four rank doublets with rij < R and with r2j < R- The 
probability distribution of the number of rank doublets 
with both ranks < R is given by the hypergeometric 
distribution: 



\(N P -R\ 
J\R-u h J 



Cr) 



(8) 



where u h < R (if u h > R then P = 0). 

To expand this calculation to more than two telescopes, 
we use the law of total probability to calculate 



¥(U i+1 = « h ) : 



l=u h 

xF(Ui = l)], 



(9) 



where Uh is the number of measurements with r < R on 
all telescopes 1 to i + 1. The conditional probability is 



defined as 



¥(U i+1 = u h \Ui = I) - 



>A(N P -R\ 

J \ l-u h J 



ft 



(10) 



Given that each rank must occur exactly once for each 
telescope, 

P(L/i = I) = Sm, 

and one can thus expand Eq.[9] to include an arbitrary 
number of telescopes. 

4.3. Application of Diagnostic Statistics 

To date, the TAOS project has only a nalyzed data 
sets with lightcuryes from three telescopes dBianco et al.l 
[Mot IWaTigeraIll2009bt IZhang et all 120081) . Therefore 
we only describe the application of the diagnostic tests 
to three-telescope data. 

For each data run, we apply both the Pearson's % 2 
statistic u c and the hypergeometric test statistic itj, to 
each lightcurve set. For the Pearson's \ 2 test, we use 
a grid size of A^g = 5, which corresponds to a total of 
v = 112 degrees of freedom. For the hypergeometric 
test, we set R = N p /5, rounding to the nearest integer. 
(A typical 90 minute data run will have iV p = 27, 000, 
but many runs are truncated due to bad weather.) Due 
to the fact that any correlations in the data may not 
show up in lightcurves with low SNR values, we perform 
the diagnostic tests only on those lightcurve sets with 
SNR > 10. Details of the algorithm used to calculate 
SNR v alues of our lightcurves are given in IZhang et al.l 
((20091) . To summarize, we first calculate a 5cr-clipped 
rolling mean similar to that which is calculated in the 
mean filter, and then average the value of the rolling 
mean to get the signal. We then subtract the rolling 
mean from the raw lightcurve, and calculate a 5<r-clipped 
standard deviation of the new lightcurve, which we use 
as an estimate of the noise. 

Even in the case of completely independent lightcurves, 
random chance will give rise to a number individual 
lightcurve sets with aberrant values of the test statis- 
tic. We therefore look at the ensemble of test statistics 
for each data run, and require a match to the theoretical 
distributions. A set of examples is shown in Fig. 1101 The 
top panels show histograms of u c and Mh for all of the 
lightcurve sets in a data run with no evident dependence 
among the lightcurves, and the bottom panels show the 
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Fig. 9. — Left panel: rank-rank diagram illustrating the Pearson's \ 2 test. The rank-rank diagram is divided into a N g X N g grid (N s = 3 
in this case), and the number of rank pairs in each box is tabulated and compared with the expected uniform distribution. Counts in the 
gray elements are not free parameters. Right panel: rank-rank diagram illustrating the hypergeometric test. The test counts the number 
of objects in the lower left corner (dark shaded region) of the rank-rank diagram, in this case with a box size of Ft = 4. Note that there are 
four rank doublets with r\j < 4 and four rank doublets with T2j < 4 (light-shaded regions) since each rank must occur exactly once in a 
lightcurve set for each telescope. In this case there are three rank doublets where < 4 for both telescopes. 
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Fig. 10. — Results of diagnostic tests. Histograms indicate actual data, and solid lines indicate theoretical distributions. Top panels: 
Pearson's \ 2 test (left) and hypergeometric test (right) for a data run with no evident dependence between telescopes. Bottompanels: 
same as above, but for a data run with significant dependence between lightcurves. The lightcurve set shown in Figures [7] and \E\ comes 
from this data run. 



same data for the pathological data run which contains 
the lightcurve sets shown in Figures [7] and |U Clearly, the 
data shown in the top panels match the theoretical dis- 
tribution quite well, while the data in the bottom panels 
do not. 

To determine which data runs are to be rejected, we 
test the goodness of fit of the distribution of test statis- 
tics over the lightcurve sets in a data run to their the- 
oretical distributions. To set a threshold, for each data 
run we calculate the quantity -D max , which is defined as 
the absolute value of the maximum difference between 



the cumulative distribution of measured test statistics 
and the theoretical cumulative probability distribution. 
This is analogou s to the Kolmogorov-Smirnov test (see 
iPress et al.lll994l and references therein). For each data 
run, we calculate -D max for both the Pearson's % 2 test and 
the hypergeometric test. A scatter plot of these values is 
shown in Fig.lTTl 

Data runs that fail either of the two tests are removed 
from the occultation event search. For data runs ex- 
hibiting widespread dependence between the lightcurves 
from different telescopes, we expect the measured dis- 
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Fig. 11. — Scatter plot showing D max values for the Pearson's x 2 
test (x-axis) and hypergeometric test (y-axis). Each point corre- 
sponds to a single data run. For each test statistic, we reject data 
runs with D max > 0.2 (dotted lines). 




Fig. 12. — Illustration of the BBS test. Top panel: original 
lightcurve, divided into five blocks of data (dotted lines). Bot- 
tom panels: four lightcurves with the blocks permuted randomly. 
Blocks are labeled 1 through 5 for reference. 

tributions to differ significantly from the theoretical dis- 
tributions, giving rise to large values of Z) max . Visual 
inspection of several data runs indicated that setting a 
cut on D max > 0.2 allowed us to reject nearly all of the 
runs where the lightcurves exhibit clear, widespread de- 
pendence. 

5. IMPROVED DIAGNOSTIC TESTS 

While the diagnostic tests described in the previous 
section are sufficient to remove nearly all of the data 
runs with significant dependence between lightcurves 
from different telescopes, they suffer from some limi- 
tations which motivated us to improve the techniques. 
First, the threshold D max was chosen somewhat arbi- 
trarily after visual inspection of many data sets, since 



there is no way to determine empirically what the op- 
timum threshold actually is. Second, in order for the 
measured statistical distributions to match the theoret- 
ical x 2 an( i hypergeometric distributions, the original 
time series data in the lightcurves are required to be in- 
dependent and identically distributed (i.i.d.), which is 
a stricter requirement than stationary and ergodic in 
mean. If some auto-correlation structure were present 
in the lightcurves, the lightcurves could still be station- 
ary and independent from each other, however, the test 
statistics would not be expected to match the theoret- 
ical distributions. Finally, and most importantly, we 
would like to apply the same tests to lightcurves when 
searching for multi-point events. As discussed in ^3.31 
for such event searches we would take a moving average 
of the lightcurve data, or perhaps take the inner product 
of the lightcurve with some event template. Such fil- 
tering will induce auto-correlations into the lightcurves, 
and increase the SNR values as well. If we increase the 
SNR enough, some insignificant correlations between the 
lightcurves might in fact become significant in the fil- 
tered data. So it would be useful to apply the diagnostic 
tests to the data runs after the application of these filters. 
However, the introduction of significant auto-correlations 
into the lightcurve data will more or less guarantee that 
all of the data runs will fail the diagnostic tests since the 
lightcurves will not be i.i.d. 

We have thus devel oped a new te chnique based on the 
Blockwise Bootstrap (Kiinsch 1989) method (hereinafter 
BBS), which uses both the Pearson's \ 2 statistic u c and 
the hypergeometric statistic Wh, described in the previ- 
ous section, but requires no assumptions about the the- 
oretical distributions for either statistic. The BBS test 
is implemented as follows. First, for a given lightcurve 
set, we calculate both u c and zt n . Then we divide each 
lightcurve in the lightcurve set into 100 subsets, or blocks, 
of data. We then permute the blocks randomly, with 
each lightcurve in the lightcurve set undergoing a differ- 
ent random permutation, and recalculate the diagnostic 
test statistics. We repeat this step a total of 99 times, 
and we are thus left with 100 statistical measurements 
for each of the diagnostic tests. 

This is illustrated schematically in Fig.[l2j The top 
panel shows the original, unpermuted lightcurve, divided 
into five blocks. The blocks are labeled 1 through 5 for 
clarity. The bottom four panels show the same lightcurve 
with the five blocks randomly permuted. Note that the 
data within each block remain unchanged. 

For each diagnostic test, we have now calculated 
100 different values, one for the original lightcurve set 
and 99 for the randomly permuted lightcurve sets. If 
we permute the blocks we still preserve the stationary 
structure as long as the block size is large in compari- 
son to the time scale of any autocorrelation. So if the 
lightcurves are independent, our 100 values are like 100 
independent draws from the same distribution. Thus, if 
we rank each of the series of 100 test statistics from 1 
to 100 (where a rank of one corresponds to the largest 
value of u c or u n , which would be the worst match to the 
expected distribution), the ranks should be uniform and 
we can calculate associated p-values as 
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and 

nv < Vh ) = ^, (12) 

where v c and Vh correspond to the ranks of the test statis- 
tics u c and «h from the unpermuted (original) light curve 
sets. 

The BBS test is then performed on every lightcurve 
set in a data run with SNR > 10. In the case of a data 
run which does not exhibit any strong dependence be- 
tween the telescopes, the p-values v c and Vh should be 
uniformly distributed on {0.01, ...,1}. However, in the 
case where there is significant dependence between the 
telescopes, we would expect the distributions of v to be 
clustered at small values, since any correlation between 
the telescopes would disappear when the blocks are ran- 
domly permuted. This is illustrated in Fig.[T31 which 
shows a histograms of the values v c from each lightcurve 
set in two different data runs. The histogram in Fig.[T3k 
shows the results from a data run with little dependence 
between the telescopes, while panel Fig.[l3b shows a data 
run with strong dependence. 

In order to quantify the amount of dependence be- 
tween the telescopes in a data run, we define two new 
test statistics, w c and Wh, which are defined as the num- 
ber of lightcurve sets in a data run with v c < vt and 
i>h < Vt respectively, where we choose v t = 0.10 (This 
corresponds to the lowest bin in the histograms shown in 
Fig-US])- In the case of independence between telescopes, 
the distributions of w c and follow the binomial dis- 
tribution of the form: 

p(w h )=( L )vr(l-Vt) L ~ w \ 

\ w hJ 

where L is the number of lightcurve sets with SNR > 
10 in the data run which are used to calculate the test 
statistics u c and Uh- Using these distributions, we can 
calculate two test statistics for the entire data run, which 
we define as 

x c (w c ) = P(W > w c ), 
x h (w h )=P(W > w h ). 

For the data run in Fig.[T3k. we thus calculate x c = 0.85, 
while for the data run in Fig. ll3b . we have x c = 7.6 x 

io- 7 . 

We can now reject a data run for significant depen- 
dence by setting thresholds on x c and x^. In the ab- 
sence of any significant dependence, the values of x c 
and a;h should be distributed uniformly on the interval 
[0,1]. Plots of the distributions of x c and Xh statistics 
are shown in Fig.[TH In order to illustrate the applica- 
tion of the BBS test to multi-point occultation searches, 
we also show the distributions after taking moving aver- 
ages on the lightcurves with bin sizes of 5 and 15. The 
histograms shown have a bin size of 0.001, and with a 
total of 524 data runs we expect a value of about 0.5 
for each bin. However, note the large number of counts 
in the lowest bins. These are the lightcurve sets that 

15 Tests have shown that as long as Vt is relatively small, the ex- 
act value chosen for vt has no significant effect on the final results. 



show dependence between telescopes. Note that some of 
the histograms show a slight excess in the second bin as 
well. By rejecting all data runs that appear in the first 
two bins, we are clearly rejecting nearly all of the data 
runs exhibiting widespread dependence between the tele- 
scopes. Note that we only expect a total of one data run 
in the first two bins from random chance. Furthermore, 
note that the larger the bin size on the moving average, 
the more data runs that are rejected. This is because 
of low level correlations that are insignificant in the un- 
binned data, but become significant in the binned data 
due to the increased SNR of the binned lightcurves. 

The BBS test is clearly a superior method to the sim- 
ple comparison of the test statistics to their theoretical 
distributions. It is very clear where the thresholds on 
x c and Xh should be, the test will not reject data runs 
where the lightcurves are stationary but not i.i.d., and 
the tests are capable of robustly rejecting data runs when 
performing searches for multi-point occultation events. 

6. CONCLUSION 

We have developed a technique to search for extremely 
rare coincident events in voluminous multivariate (multi- 
telescope) time series data. Using rank statistics, this 
technique enables robust determination of event signifi- 
cance and false positive rate, independent of the under- 
lying noise distributions in the time series data. This 
method has been used to search for rare occultation 
events by KBOs in over 500 data runs comprising a to- 
tal of nearly 370,000 lightcurve sets. Furthermore, we 
have developed a method to test for widespread depen- 
dence between lightcurves in a data run, which allows us 
to reject runs with inherent characteristics which could 
possibly give rise to a larger false positive rates. We note 
that while the method described in this paper is sufficient 
for the calculation of the rate of false positive events that 
arise due to random statistical chance, it is not capable of 
estimating the background even t rate due to syste matic 
errors in the TAOS photometry ([Zhang et al.ll2009| ). For 
example, tracking errors or moving objects in the images 
could give rise to false detections in the data set. A de- 
scription of how such backgroun d events are handled is 
described in lBianco et al.l (|2010D . 
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Fig. 13. — Panel (a): distribution of p-values v c from the Pearson's x 2 test from a good data run. Panel (b): same plot for a rejected 
data run. 
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APPENDIX 
EVALUATION OF THE K FUNCTION 

Here we present an algorithm to evaluate K(x;k,n), the number of ways to get a product of x by multiplying k 
integers between 1 and n, which is applicable when x < n. 

Note that K(l; k, n) = 1. For x > 1, consider the prime decomposition of x where the p's arc unique primes and d 
is their degree so that: 

X=p d l XJJj 2 X ••• XPm™- 



where m is the total number of prime factors of x. 
We claim that 



K(x;k,n) = H 



i=l 



d l + k - 1 
k-l 



Proof 

Suppose Ai x A 2 x . . . x Ak = x and take prime decompositions of each number: 

A 1 =pt- 1 xp d 2 1 ' 2 x...xpt" n 
A 2 =p d ^xp d 2 ^x...xpt- 



Note that 



Hence, 



where 



k 

d h3 = d V i' 



K(x;k,n) = Y[S(di-,k), 

i=l 

S{d; k) = # ((ai, . . • , a k ) € {0, .., d} k s.t. ^ a, = d] 



i=i 
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Fig. 14. — Panel (a): distribution of p-values from all data runs in the data set described in [Bianco et al.1 J20TOT) for both the \ 2 and 
hypergeometric tests, with the original data and data after the application of moving average filters with window sizes of 5 and 15. Inset 
plots are zoomed in to the lowest p-values. 



Fig. 15. — Schematic illustrating the calculation of the function S(d; k). In this case, d = 10, as indicated by the top row of 10 dots. The 
bars in the bottom two rows indicate possible ways to split the ten dots into four addends. The second row indicates the tuple (4, 0, 1, 5), 
and the third row indicates the tuple of (2, 1,4,3). 



15 



is the number of ways to get a sum of d by adding k integers where < k < d. The calculation of the function S is 
best illustrated with an example. Consider the case of d = 10 and k = 4. If we illustrate the sum d = 10 with 10 dots 
in the top row of Fig. ll5[ the function S is simply the number of ways to divide the dots into 4 groups (using the bars 
shown). For example, the second row of Fig. 1151 corresponds to a tuple of (4, 0, 1, 5), while the third row corresponds 
to a tuple of (2, 1, 4, 3). So the number of possible 4-tuples is the number of ways to choose 3 bar locations in a total 
10 + 3 = 13 possibilities. This gives 



S{d; k) 



1 



For example, to calculate K(6;4,n) where n > 6, we note that 6 = 2 x 3 is the product of two primes to the first 
power, and in the 4 telescope case it is equal to 



1 + 4 - 1 N 2 
4- 1 

2 



= 4 2 
= 16, 

in agreement with Table [TJ Note that, as mentioned above, this formulation is only valid if x < n. For example, 
K (6; 4, 5) = 12, since any of the rank tuples with a rank value of 6 would be impossible in a lightcurve set containing 
only 5 points. 

As a second example, for the case of 360 = 2 3 x 3 2 x 5, we have three primes with degrees 3, 2, and 1. We thus have 
(for n > 360) 



/sT(360;4,n) 



3 + 4-l\/2 + 4-l\/l + 4- l 
4-1 )\ 4-1 )\ 4-1 

:» 

= 20 x 10 x 4 
= 800. 
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